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Abstract 

Two classical hypotheses are examined about the population growth in a system 
of cities : Hypothesis 1 pertains to Gibrat’s and Zipf’s theory which 
states that the city growth-decay process is size independent; Hy¬ 
pothesis 2 pertains to the so called Yule process which states that 
the growth of populations in cities happens when (i) the distribution 
of the city population initial size obeys a log-normal function, (ii) the 
growth of the settlements follows a stochastic process. The basis for 
the test is some official data on Bulgarian cities at various times. This system 
was chosen because (i) Bulgaria is a country for which one does not expect 
biased theoretical conditions; (ii) the city populations were determined rather 
precisely. The present results show that: (i) the population size growth of the 
Bulgarian cities is size dependent, whence Hypothesis 1 is not confirmed for 
Bulgaria; (ii) the population size growth of Bulgarian cities can be described by 
a double Pareto log-normal distribution, whence Hypothesis 2 is valid for the 
Bulgarian city system. It is expected that this fine study brings some informa¬ 
tion and light on other usually considered to be more pertinent countries of city 
systems. 


1. Introduction 


The rapid development of the methods of nonlinear dynamics and those for 
studying time series has led to many applications of new and classic method¬ 
ology to the problems of science, society and economics. A large number of 


applications is devoted to nonlinear problems of economic geography (Sheppard 
[ (19821; Puu and Panchuk (1991)). Below we shall discuss characteristics that 
seem important for understanding the evolution of complex economic or social 
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systems of a region, country or a system of countries, i.e. the population size of 
cities in a well defined geographic area. 

Let us consider a city. In order to explain the size of the population of the 
city, we have to account for its geographic location, economic development, etc. 
The evolution of a city population is affected by many factors, e.g., impulses 
generated by the city and its hinterland, interurban dependencies, or ’’shocks” 


from outside the city system (Krugman (1996)). Let us turn now to a system of 


cities. As in the case of a single city, the economic, geographic and many addi¬ 
tional factors are also important for the growth (or decay) of the populations of 
city systems. As a city system is distributed over a geographic region, there are 
variations in the above factors. These variations can be modeled by stochastic 
processes. Thus, if we are interested in a city population size distribution (Cor¬ 


doba (2008a|b|)) the building of a theory can start from appropriate stochastic 


processes rather than from model equations for the economic, geographic and 
other factors. In other words, in order to explain the distribution of the city 
population sizes in a geographic region of a city system it seems that a math¬ 
ematical model based on stochastic processes with appropriate characteristics 


is in order ( 

Seto and Fragkias 

Vespignani 

(20091). 


(2005); Vitanov et al. (2007); Glaeser (2008); 


In the course of time, the cities in a country develop a hierarchy. An expres¬ 
sion of this hierarchy is the city population size distribution that can be easily 
constructed for any urban system. Zipf ( ]Zipf (1949); loannides and Overman 
[ (20031) suggested that a large number of observed city population size distri¬ 
butions could be approximated by a simple scaling (power) law 


Nr = 


C 


( 1 ) 


where Nr is the population of the r-th largest city, where C is some ’’con¬ 
stant”, which value is obviously constrained by a normalization con¬ 
dition, and P = 1. Eq.([^ is called the rank-size scaling law. Zipf suggested 
that the particular case /3 = 1 represents a desirable situation (rank-size rule), in 
which the forces of concentration balance those of decentralization. It has been 
observed that the urban population size distributions in developed countries. 


like the USA, fits very well the rank-size rule, over several decades (Krugman 


[](1996); Madden (1956)). For broadening the view on possible other 


cases of interest, i.e. where a power law, as in Eq. M , is found, let 


on Denmark, Moura 


us mention work by Jiang and Tia (2011) on USA, |Knudsen (2001) 

( 12006D on Brazil, | Gangopadyay and Basu 


( 2009| on India and China, and Peng (2010) on China. 


In this paper, two hypotheses are tested about the growth of a population 
in a system of cities. For the test, we have selected the city system of Bulgaria. 
One reason stems from the fact that the population of Bulgarian cities can be 
determined very precisely; in particular, the ’’city” is well defined: there is no 
suburban area, as often found in the large or populated countries such as, for 
example, the USA, India, China, France, or Italy. 
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The data sets consists of the yearly count of the population of whole Bulgar¬ 
ian cities from 2004 till 2011, as recorded by the National Statistical Institute 
of the Republic of Bulgaria {http : www.nsi.bg). 

Thus, the hypotheses to be discussed are : 

• Hypothesis 1: The growth rate of the ’’rescaled city population” is inde¬ 
pendent on the size of the population. 

• Hypothesis 2: The settlement formation follows a Yule process, in which 
the initial populations of the settlements are distributed according to the 
log-normal function. The evolution of the city populations of the formed 
settlements follows a stochastic process, like the geometric Brownian mo¬ 
tion. 

Hypothesis 1 is a formulation of the Gibrat’s law which leads to power law 
distributions of the rescaled sizes of a system of cities. Hypothesis 2 gives 
the necessary conditions for describing the distribution of the (non-rescaled) 
city population sizes of a system of cities along the double Pareto log-normal 
distribution. 


2. Test of Hypothesis 1 

The Tth city rescaled size is defined as 

S,{t) = N,{t)/N{t), (2) 

at time t, where Ni (t) is the population of the i-th city and N{t) is the population 
of all cities in year t. The (rescaled city) sizes are ranked such that r = 1, 2,... 
with Nr{t)/N{t) > Nr+i{t)/N{t). In so doing, the (rescaled) rank (r)-size 
relationship reads 

ln(r) =0-/3 ln(5') (3) 

where a and /? are parameters. 

After visual inspection and much statistical analysis of the data, i.e., 
adapting the limits of the relevant intervals by trial and error proce¬ 
dures, it is found that the cities are best grouped into four classes: large cities 
(class 1); medium size cities (class 2); small cities (class 3) and very small cities 
(class 4). A comment on the various sizes has to be found below, after 
the data specific analysis. Fig. [^outlines the rank-size relationship for the 
rescaled sizes S of the system of Bulgarian cities, in 2004. The figures for 
other analyzed years, 2005-2011, are not shown, for conciseness, but 
are very similar to the 2004 case. 

It is found that the rank-size relationship for the classes I, 2, and 4 can be 
approximated by a straight line (Zipf’s law), Eq.(|^. The parameters a and 
/3 for these classes are given in Table I. Note that for the case of large cities 
the exponent /? is almost I, as conjectured by Zipf to be an optimal case. In 
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2004 

[Pop] 

Na 

Cl. 

a 


7 

1.14 10® 

16 

1 

-1.79 ±0.19 

0.99 ±0.05 

- 

70 000 

122 

2 

-0.77 ±0.03 

0.820 ± 0.004 

- 

5 000 

93 

3 

-20.99 ±0.14 

6.47 ±0.21 

0.39 ±0.02 

2 000 

17 

4 

4.42 ±0.06 

0.128 ±0.007 

- 

2011 

[Pop] 

Nc 

Cl. 

a 

/3 

7 

1.21 10® 

15 

1 

-1.46 ±0.18 

0.89 ±0.06 

- 

70 000 

120 

2 

-0.910 ±0.002 

0.837 ±0.004 

- 

5 000 

94 

3 

-23.94 ±0.19 

7.16 ±0.24 

0.43 ±0.03 

2 000 

28 

4 

3.82 ±0.07 

0.204 ±0.008 

- 


Table 1: Values of the parameters a, /3, and 7 for the four classes of Bulgarian cities for 2004 
and 2011; see Eqs. The number of cities in each Class (Cl.) is given together 

with the respective population upper size ([Pop]) in the Class. 


contrast, the rank-size relationship for the small cities in class 3 is very well 
approximated by a quadratic regression of the kind 

ln(r) = a —/31n(S') — 7(ln(5'))^, (4) 


where the corresponding parameters are written in Table For comparison, 
the parameters for the rank-size distribution of the Bulgarian cities for 2011 are 
also given in Table 

For completeness, the number of cities and their population ranges 
in each class are given in Table We emphasize that the grouping 
is not made a priori but results from data analysis. The fact that the 
analysis of unbiased measurement yields consistent results over time 
allows us to state that the prediction can be said to be reliable. 

We only show 2004 and 2011, but indicate in the text, that these 
are only examples which are confirmed in other years. On the basis 
of above we have reliability of the hypothesis for the entire studied 
interval from 2004 till 2011. This grouping is not unusual as such a 
separation into classes was observed also for the case of other countries, cf., e.g., 
(Davis (19781; Rubinstein (2007)). 

Thus, Fig. [1] and Table [1] indicate that Hypothesis 1 is not valid for the 
Bulgarian city system. 

Let us supply some more argument for such a conclusion, e.g. through a 
Pareto plot (Pareto (1896); Reed (2001); Reed and Jorgensen (2004)), i.e. let 
it be searched whether the cumulative distribution function of such a population 
system follows an inverse power law of S. 

The theory of the size-independent growth predicts a power law for P{S > 
x) ~ S~‘', with a constant C- The probability P{S > x) that the rescaled size 
of a Bulgarian city population exceeds a certain value x in 2011 is shown in Fig. 
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In Fig. this theory is represented by the dashed line, i.e. a power law fit to 
the data. It is easily seen that a power law is not a good approximation for the 
(rescaled) city population size distribution in Bulgaria. An extended version 
of the theory of the city growth should allow for a population size dependent 
growth. As a consequence of this, the exponent C depends on the population 
size of the city. Therefore, some ((x) can be estimated as follows. 

The solid line in Fig. [^represents the fit of the probability P(S > x) for the 
system of Bulgarian cities for 2011. This fit corresponds to the distribution 


P(x) = -erfc 
^ ^ 2 


ln(x/b) 

cV2 


( 5 ) 


where a, b and c are parameters and erfc(x) = 1 — erf(x) where erf(x) is 


erf(x) 



( 6 ) 


In order to obtain C(2;), the theoretical distribution {x/Smin) is assumed 
to be the same as Eq.(|^, i.e.. 



ln(a:/6) 

\ ^min J ^ 

[ cV2 \ 


( 7 ) 


From Eq.Q, the local exponent (^{x) is so obtained: 


^ ln(a/2) + ln{erfc[-(ln(x/ 6 ))/(c 72 )]} 

\n{x/Sjnin) 


( 8 ) 


The local exponent from Eq.Q is shown in Fig. [^ 

For a large interval of x the value of C is close to 1, the Zipf value. Large 
deviations from 1 are observed for the class of small cities. If Hypothesis 
1 was true for the whole system of Bulgarian cities, C should be a constant 
corresponding to a straight line in Fig. This is not the case, thereby implying 
that Hypothesis 1 is not valid for the system of Bulgarian cities. 


3. Test of Hypothesis 2 


It is emphasized first that the Hypothesis 2 is formulated for the non- 
rescaled sizes of the populations in a system of cities. The test of the hypothesis 
goes as follows: the city population size distribution for the Bulgarian cities is 
fitted to the double Pareto log-normal distribution. 


a/3 


, /3^cr^ 

2(„ + « “P 


V <jV2 

-ifl_ert(hW_4±A! 

V <jV2 


( 9 ) 
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If the assumptions of Hypothesis 2 are correct, it should be expected that the 
double Pareto log-normal distribution provides a good fit to the data. However, 
if one or more of the assumptions in Hypothesis 2 are not valid, the fit should 
not be a satisfactory one. 

The distribution of population sizes of the system of Bulgarian cities in 2004 
and in 2011 are presented in Fig. The dash line represents the double Pareto 
log-normal distribution that corresponds to the distribution of the Bulgarian 
cities in 2011. From the values of a and (3 and from the properties of the double 
Pareto log-normal distribution it follows that p{x) ~ ® , for the case of small 

city sizes and p{x) « x~^ for the large cities. Similar results are found for the 
years for which the data are available. 

Thus, the double Pareto log-normal distribution fits well the city population 
size distributions for the Bulgarian city system whatever the year in this century. 
Thus Hypothesis 2 is confirmed. 


4. Concluding remarks 


The study of the size of populations in cities of a country has many economic 
applications. Apart on the decision taking about placing industrial plants or 
building trade centers, the city population sizes influence the efficient use of 


resources or the possibilities of economic growth (Henderson (1986); Henderson 


(20011). Thus, the investigation of the evolution of the populations in a system 
of cities is an important research topic. 

Our study reported here above has shown that for the case of the Bulgarian 
city system Hypothesis 1 is not confirmed but Hypothesis 2 is confirmed. 
The validity of Hypothesis 2 even gives some information about the history of 
settlement formations in Bulgaria. It seems that the distribution of the initial 
populations of the settlements was indeed log-normal and that the formation 
happened according to the Yule process, i.e. the new settlements have been 
formed on the basis of the existing older settlements nearby. 
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Supplemental material 


Supplemental material can be found containing some brief review of the 
pertinent literature on nonlinear dynamics, nonlinear time series analysis and 
their applications. The two hypotheses tested in the main text of the paper 
being closely connected to the existence of power laws for the population sizes 
of the cities are discussed along analytical lines. It is shown that Kesten and 
Gibrat proportional random growth processes lead to power law distributions. 
On the other hand, the Yule process of settlement formation and the geometric 
Brownian motion assumed for their growth is shown to result in a double Pareto 
log-normal distribution. 
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Fig. 1 Rank-size relationship for the system of Bulgarian cities in 2004. The 
cities are divided into 4 classes. The rank-size relationships for the classes 1, 2, 
and 4 are fitted by a linear regression ln(r) = a — fi In(S'). 

Fig. 2 Probability P{S > x) that the rescaled population city size Si of Bul¬ 
garian cities is larger than some value x in 2004 (triangles), 2007 (squares), and 
2011 (circles). Dashed line: Power law distribution with a constant exponent 
C = 1.02. Solid line: Eq.Q with a = 0.359; b = 0.0023; c = -1.469. 

Fig. 3 Local exponent C(a^) from Eq.(|^ with parameter values of Fig.2. The 
vertical lines mark the boundaries between the 4 classes of cities from large to 
small sizes with increasing x. 

Fig. 4 Probability density functions for Bulgarian city population size in 
2004 (circles) and 2011 (squares) with a double Pareto log-normal distribution 
fit Eq.([^ (dashed line) for 2011. The parameters of the double Pareto log¬ 
normal distribution from the dashed line are: a = 2.049; (3 = 0.216, a = —6.188; 

= -25.737. = 0.996 in 2011. 
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